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We  describe  the  interaction  of  three  aspects  core  to  a  personalized  scheduling 
task.  First,  we  develop  a  preference  model  designed  to  capture  user  preferences  for 
the  task  of  scheduling  a  meeting  request  between  multiple  people,  and  a  methodol¬ 
ogy  for  preference  elicitation  to  initially  populate  this  model.  Second,  we  explain 
a  natural-language-based  elicitation  of  the  meeting  request  details  and  constraints, 
and  outline  the  solving  of  the  resulting  constrained  scheduling  problem  (with  pref¬ 
erences).  Third,  we  describe  the  display  of  solutions  to  the  scheduling  problem  to 
the  user,  as  candidate  scheduling  options  with  explanations,  and  detail  unobtrusive 
learning  of  revisions  to  the  preference  model  from  the  user’s  choices  among  the  can¬ 
didates.  We  describe  the  user  studies  that  informed  our  design  choices,  and  assess 
the  resulting  system  in  terms  of  the  quality  of  scheduling  options  presented,  accord¬ 
ing  to  the  user.  The  scheduling  task  enabled  by  the  integration  of  these  aspects  has 
been  implemented  within  a  deployed  application. 
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1  INTRODUCTION 

All  too  often,  arranging  meetings  in  an  offiee  environment  turns  into  a  tedious  proeess.  One 
eommon  method  that  people  employ  is  a  series  of  emails  to  propose,  rejeet,  eounter-propose, 
and  eventually  agree  on  a  time.  The  effort  eonsumed  in  sueh  praetiees  motivates  the  opportunity 
for  automated  assistanee  in  seheduling. 

Although  a  number  of  fully-  or  semi-automated  seheduling  systems  have  been  developed,  they 
have  suffered  from  low  adoption  rates  for  two  main  reasons  [14,  32,  40]:  they  fail  to  aeeount  for 
the  personal  nature  of  seheduling,  or  they  demand  too  mueh  eontrol  of  an  important  aspeet  of  an 
individual’s  working  world. 

The  personal  nature  of  seheduling  is  most  direetly  seen  in  situations  where  a  user’s  meeting 
request  eannot  be  fully  satisfied.  For  example,  suppose  that  Aliee  requests  a  90  minute  meeting 
with  Bob  and  Chris  next  Tuesday  afternoon,  but  no  time  is  available  to  all  during  that  period. 
Some  users  may  prefer  the  option  of  a  shortened  60  minute  meeting,  while  others  (like  Aliee) 
may  prefer  the  full  90  minute  meeting  Tuesday  morning.  Most  users  would  like  to  be  presented 
with  both  options  (more  generally,  with  all  relevant  options,  of  whieh  there  ean  be  many  in 
sueh  over-constrained  situations),  but  sueh  that  the  options  they  most  prefer  have  priority  in  the 
presentation.  It  is  preeisely  in  these  diffieult,  over-eonstrained  situations,  in  whieh  there  may  be 
many  eompeting  faetors  and  many  possible  relaxations  of  the  meeting  request,  where  seheduling 
assistanee  is  most  useful.  But  to  be  of  real  value  here,  an  automated  seheduling  assistant  must 
aetively  aeeount  for  a  user’s  seheduling  preferenees. 

This  report  deseribes  our  work  on  modelling,  elieiting,  learning,  and  reasoning  about  the 
scheduling  preferences  of  an  individual.  The  approach  we  take  is  to  combine  initial,  lightweight 
elicitation  of  scheduling  preferences  with  unobtrusive,  online  refinement  of  them.  This  personal¬ 
ized  preference  model  informs  a  constraint-based  scheduling  engine  that  computes  and  presents 
options  in  response  to  a  meeting  request.  The  user  may  select  one  of  these  options,  request 
further  options,  or  refine  her  meeting  request.  By  initially  eliciting  the  user’s  preferences,  the 
system  can  offer  reasonable  scheduling  options  from  first  use;  by  refining  its  knowledge,  the 
system  can  become  progressively  more  capable  and  trustworthy  over  time  [20]. 

While  prior  research  has  looked  at  one  or  more  of  these  aspects  —  modelling  and  eliciting, 
learning,  and  reasoning  —  the  resulting  systems  have  rarely  sought  to  encompass  all  three.  For 
example,  representing  and  learning  user  preferences  but  not  performing  sophisticated  reasoning 
to  offer  scheduling  options  [25],  or  representing  preferences  and  performing  constraint  reasoning 
but  not  updating  the  preferences  by  learning  [13]. 

The  key  contribution  of  this  report  is  a  description  and  analysis  of  a  preference  model  that 
balances  the  elicitation,  learning,  and  constraint  reasoning  trade-offs  within  a  semi-automated 
scheduling  system.  First,  for  elicitation,  more  expressive  models  better  capture  the  nuances  of 
user  preferences,  but  require  more  effort  to  specify.  Second,  for  learning,  expressive  models 
require  significant  training  (and  tend  to  overfit),  while  inexpressive  models  cannot  distinguish 
between  candidate  schedules  that  are  distinguished  by  the  user.  Third,  for  automated  searching 
for  and  ranking  of  candidate  schedules  (hereafter,  constraint  reasoning),  it  is  more  difficult  to 
define  and  reason  over  complex  objective  functions,  preferences,  and  constraints. 

Based  on  our  preference  model,  the  end-to-end  semi-automated  scheduling  assistance  enabled 
by  the  integration  of  these  three  aspects  has  been  implemented  within  a  deployed  system  called 
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Preference  Model 


Figure  1:  Use  of  the  preference  model  in  PTIME. 


PTIME  [2].  PTIME,  a  component  of  a  larger  cognitive  assistant  named  CALO  [29],  manages 
the  calendar  and  the  scheduling  requests  of  a  CAEO  user  in  a  mixed-initiative  manner.  Eigure  1 
shows  the  process.  PTIME  elicits  scheduling  preferences  (step  0);  elicits  a  meeting  request  (step 
1);  computes  candidate  schedules  (possibly  relaxations)  in  response  to  the  request,  by  means 
of  a  Constraint  Reasoner  module,  and  displays  a  subset  of  the  candidates  to  the  user  as  options 
(step  2);  and  accepts  the  user’s  choice  of  the  desired  schedule  option  (step  3).  Based  on  which 
option  the  user  chooses  for  each  request  among  the  presented  options,  the  learning  module  in 
PTIME  updates  the  parameters  of  the  preference  model  instance  (step  4)  [9].  The  updated  model 
becomes  the  basis  of  reasoning  over  candidate  schedules  for  the  next  scheduling  request. 

The  dashed  arrow  from  step  3  back  to  step  2  indicates  the  user’s  ability  to  reject  all  the  pre¬ 
sented  schedule  options  and  revise  her  scheduling  request.  She  might  do  so  because  she  finds 
none  of  the  options  satisfactory,  or  because  seeing  the  options  has  stimulated  her  to  refine  her 
requesf  or  explore  an  alfernafive. 

Af  presenf,  fhe  PTIME  user  organizing  fhe  meefing  decides  which  meefing  opfion  fo  selecf, 
faking  info  considerafion  ofhers  parficipanfs’  generic  scheduling  preferences.  In  fhe  fufure,  she 
will  be  able  fo  also  lake  ofhers’  meeling-specific  preferences  info  considerafion.  The  selecled 
meefing  opfion  is  presenfed  fo  invilees  for  inclusion  or  olherwise  in  fheir  calendars.  In  fhe 
simplesl  form  of  negolialion  supported  by  PTIME,  fhe  olher  parficipanfs  besides  fhe  organizer 
may  simply  accepl  fhe  meefing  requesf  or  nol.  The  system  is  being  developed  fo  support  more 
elaborafe  forms  of  negolialion. 

We  begin  by  describing  our  firsf  user  sludies  and  fhe  requiremenfs  on  fhe  preference  model 
lhal  we  derived  from  Ihem.  We  Ihen  describe  our  model,  based  on  Mulli-Allribule  Ufilily  Theory 
(MAUT).  [19].  Nexl,  we  describe  fhe  inlerfaces  and  design  choices  for  elicifing  bolh  an  initial 
insfance  of  fhe  model  {preference  elicitation),  and  fhe  specification  of  and  constrainfs  on  meefing 
requesls  {problem  elicitation).  The  following  sections  describe  fhe  presenlafion  of  schedule 
oplions  and  fhe  learning  based  on  user  choices,  and  Ihen  reporl  a  sel  of  sludies  and  experimenls 
we  conducled  fo  validafe  (or  olherwise)  fhe  Ihree  aspecls  of  model  and  elicilalion,  learning,  and 
reasoning,  and  fhe  overall  user  experience  of  fhe  PTIME  system. 
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2  DEVELOPMENT  OF  A  BALANCED  PREFERENCE  MODEL 

An  earlier  model  of  user  scheduling  preferences  in  the  PTIME  system,  reported  in  [9],  sought  to 
capture  temporal  preferences  (day  and  time  of  events)  for  under-constrained  meeting  requests.  It 
was  composed  as  a  weighted  linear  sum  of  features  such  as  meeting  start  and  end  times.  Building 
a  constraint  problem  representation  and  reasoning  over  it,  to  find  preferred  schedule  options  ac¬ 
cording  to  the  model,  was  straightforward.  Refining  a  model  instance,  likewise,  proved  effective 
using  support  vector  machine  learning  techniques  [17].  Despite  these  positive  aspects,  the  model 
could  not  capture  non-temporal  preferences,  such  as  for  meeting  participants,  and  its  expressive¬ 
ness  was  too  limited  to  capture  how  temporal  and  non-temporal  criteria  interact.  Compared  to 
the  under-constrained  case,  these  two  aspects  have  a  significant  role  for  over-constrained  meet¬ 
ing  requests. 

2.1  User  Studies  on  Scheduling  Habits 

To  gain  insight  into  how  individuals  arrange  meetings,  and  the  factors  that  influence  fheir  decision¬ 
making  regarding  how  fo  propose  and  respond  fo  meefing  requesfs,  we  conducfed  an  informal 
user  sfudy.  The  sfudy  consisfed  of  fwo  parfs:  an  in-sifu  diarying  by  fhe  subjecfs,  and  a  posf-hoc 
evaluafion  of  fhe  diary  combined  wifh  a  semi-sfrucfured  inferview.  We  did  nol  expecf  subsfan- 
five  quanfifafive  dafa  fo  resulf  from  fhe  diaries;  rafher,  our  aim  in  fhe  sfudy  was  fo  undersfand 
fhe  qualifafive  factors  influencing  fhe  user’s  scheduling  pracfice:  bofh  fheir  individual  nafure  and 
fheir  mufual  inferacfion. 

Eleven  subjecfs  took  parf  in  fhe  sfudy,  all  members  of  our  organizafion.  Their  roles  included 
soflware  engineer,  engineering  manager,  researcher,  program  manager,  and  adminisfrafive  slaff. 
We  asked  fhe  subjecfs  fo  mainfain  a  diary  over  one  week’s  worfh  of  scheduling  acfivifies,  keeping 
hack  of  how  meetings  were  scheduled,  whaf  decisions  were  made  and  why  fhey  were  made.  We 
fargefed  some  subjecfs  wifh  crowded,  consfrained  schedules.  Two  of  fhe  subjecfs  were  unable  fo 
perform  fhe  diarying,  eifher  due  fo  excessive  busyness  (despile  fhe  low  overhead  of  fhe  exercise), 
or  because  fheir  schedules  are  managed  by  fheir  adminislralors.  We  nonelheless  interviewed 
Ihese  subjecfs  regarding  fheir  scheduling  preferences.  In  one  case,  we  asked  an  administrator  to 
perform  the  diarying  and  interviewed  her  with  respect  to  scheduling  for  the  individual  she  assists. 
The  fivefold  foci  of  fhe  sfudy  were  event  characteristics  (e.g.,  one-on-one  vs.  group  meelings), 
scheduling  processes  (e.g.,  iterated  refinemenl  of  a  time),  decision  factors  (e.g.,  relationship  to 
meefing  hosl),  preferences,  and  scheduling  needs. 

Meetings/events  We  found  Ihal  Ihe  subjecfs  perceive  Ihree  dimensions  fo  fhe  meelings  on 
fheir  calendars:  (1)  one-on-one  vs.  group  (one-on-one  meetings  are  often  walk-in/impromplu 
while  group  meelings  are  scheduled  in  advance);  (2)  mandatory  vs.  optional  attendance  (fhe 
subjecfs  often  place  optional  evenfs  (e.g.,  seminars)  on  fheir  calendar,  wifh  fhe  undersfanding  lhaf 
fhey  may  be  deleted/overlapped);  and  {3)  fixed  vs.  floating  (a  sample  floating  evenl  is  “exercise 
al  leasl  Iwice  a  week”). 

Process  We  found  lhaf  fhe  subjecfs  schedule  evenfs  in  a  variely  of  ways,  including:  (1)  walk- 
in  (mosl  often  used  for  one-on-one  and  for  crilical  meelings);  (2)  constraint  satisfaction  (used 
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for  group  meetings:  e.g.,  host  suggests  a  time  window  and  asks  for  everyone’s  eonstraints;  if 
there  is  a  viable  time,  the  meeting  is  set;  otherwise,  the  proeess  repeats);  (3)  iterated  refinement 
(used  for  one-on-one  meetings:  e.g.,  A  suggests  tomorrow,  B  says  afternoon,  A  asks  “2pm?”, 
B  responds  “how  about  3pm?”,  A  says  okay);  (4)  add-in  (used  for  meeting  announeements 
and  reeurrent  meetings);  and  (5)  soft  seheduling,  i.e.not  plaeed  in  the  ealendar  (often  used  for 
tasks/todos  and  other  floating  events). 

Factors  We  found  that  the  subjeets  take  into  eonsideration  a  handful  of  faetors  that  are  a 
subset  of  a  broadly  common  list,  when  scheduling  a  meeting  and  when  deciding  whether  to 
accept  a  meeting  request.  We  group  the  factors  into  seven  categories:  (1)  importance  (of  the 
meeting;  its  importance  to  you,  your  importance  to  the  meeting);  (2)  urgency /criticality;  (3) 
interest;  (4)  relationships  (host,  relationship  to  host;  other  participants,  relationship  to  other 
participants  [40]);  (5)  perturbation  (effect  on  other  meetings);  (6)  stability  (often  characterized 
as  “number  of  times  the  meeting  has  been  rescheduled”);  and  (7)  preferences  (of  the  individual 
and  of  others).  The  relative  importance  of  the  features  varied  considerably  between  subjects. 

Preferences  We  found  that  the  subjects  explicitly  indicated  preferences  for  or  against  some 
specific  features.  We  group  them  into  four  categories:  (1)  general  meeting-specific  preferences 
(e.g.,  time  of  day);  (2)  general  calendar-wide  preferences  (e.g.,  fragmentation,  density,  num¬ 
ber  of  meetings  per  day);  (3)  preferences  over  how  to  relax  meeting  request  constraints  (e.g., 
proximity  to  specified  time,  proximity  to  specified  duration,  attendance  of  high-priority  partic¬ 
ipants);  and  (4)  preferences  over  how  to  relax  calendar-wide  constraints  (e.g.,  no  overlaps,  no 
Friday  p.m.  meetings  because  of  childcare  situations,  no  late  meetings  on  carpool  days). 

Needs  We  concluded  the  interviews  by  asking  “What  do  you  most  want  out  of  a  scheduling 
assistant?”  The  most  common  responses  were:  (1)  coordinating  meetings  between  a  group 
of  busy  people;  (2)  intelligent  reminders;  (3)  transparency  into  assisted  scheduling  decisions 
(e.g.,  explanations  of  conflicts  and  learned  preferences);  and  (4)  greater  control  over  scheduling 
processes  (e.g.,  accepting  requests,  separation  of  participants’  availability  and  preferences). 

Among  other  studies  of  user  scheduling  habits  and  (semi-)automated  calendaring  tools,  Palen 
[32],  for  instance,  examined  the  use  in  situ  of  group  calendaring  software  at  Sun  Microsystems. 
Several  researchers  report  that  people  are  reluctant  to  invest  in  accurately  and  fully  informing  a 
scheduling  system  of  their  preferences  (to  the  extent  that  they  can  articulate  them)  unless  either 
(1)  the  process  is  not  burdensome  and  they  are  persuaded  of  the  benefit;  or  (2)  they  are  mandated 
to  do  so  [14,  40].  Other  studies  support  our  finding  that  even  when  people  are  confident  in 
the  behavior  and  the  decisions  of  a  (semi-)automated  system,  they  seek  transparency  into  its 
reasoning  [1,  32]. 

Our  own  study  and  prior  work  also  demonstrates  that  evaluation  of  scheduling  options  is  con¬ 
tingent  on  multiple  criteria  and  their  interaction  [20,  3].  Accounting  for  the  relative  importance 
of  these  features  is  crucial  if  we  are  to  offer  the  user  desirable  relaxations  for  over-constrained 
requests.  Whereas  [13,  9,  26]  and  others  assume  no  interaction  between  the  criteria  in  their 
preference  model  —  which  simplifies  the  learning  and  solving  aspects  (to  the  extent  that  earlier 
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work  accounting  for  preferences  makes  use  of  both  learning  and  reasoning)  —  this  choice  limits 
the  expressiveness  of  the  model. 

A  set  of  requirements  for  a  preference  model  results  from  these  various  ethnographic  investi¬ 
gations.  First,  the  model  must  be  populated  from  intuitive  (and  thus  likely  qualitative)  statements 
expressed  in  terms  of  concepts  that  the  user  is  familiar  with  in  the  domain,  i.e.,  events,  people, 
and  calendars.  Second,  the  expressiveness  of  the  model  must  be  such  that  it  can  capture  enough 
of  user  scheduling  preferences  to  enable  the  personalized  scheduling  task.  Third,  the  preference 
model  must  be  explainable  to  the  user,  again  in  terms  of  familiar,  domain-relevant  concepts  [25]. 
Fourth,  the  preference  model  must  be  able  to  express  multiple  criteria  that  factor  into  the  user’s 
decisions,  and  the  interaction  between  the  criteria,  such  as  between  the  meeting  duration  and 
the  temporal  preferences  of  other  participants.  Finally,  as  we  have  argued,  the  model  must  be 
tractable  for  reasoning  and  learning,  if  a  practical  scheduling  assistant  is  to  be  built. 

2.2  Choquet  Integral  Model  of  Scheduling  Preferences 

The  preferences  literature  is  rich  with  qualitative  and  quantitative  models  of  varying  expressive 
power  and  computational  tractability;  for  surveys,  we  refer  to  [31,  10].  The  reasoning  and 
learning  methods  we  wish  to  bring  to  bear  on  scheduling  problems  are  quantitative,  and  the 
features  of  the  preferences  are  not  independent.  Suitable  models  can  be  based  on,  for  instance. 
Generalized  Additive  Utility  (GAI)  [7]  or  Multi- Attribute  Utility  Theory  (MAUT)  [19]. 

To  balance  the  competing  aspects  of  expressiveness  and  elicitation,  learning,  and  reasoning, 
and  to  meet  the  above  requirements,  we  chose  to  adopt  the  global  utility  function  approach 
of  MAUT.  By  providing  a  single  function  by  which  the  system  can  rate  alternative  schedules, 
a  MAUT  approach  is  in  principle  amenable  to  the  schedule  evaluation  and  preference  learning 
components.  The  challenge  of  adopting  any  approach  is  to  build  a  model  suited  to  the  scheduling 
domain,  and  to  adapt  reasoning  and  learning  algorithms  to  it. 

A  MAUT  model  is  specified  by  a  set  of  n  criteria,  a  set  tti, . . . ,  of  (partial)  utility  func¬ 
tions  that  make  the  criteria  commensurate,  and  an  aggregation  function  F.  An  instance  of  the 
model,  i.e.,  a  capturing  of  the  preferences  of  an  individual,  is  specified  by  the  coefficients  of  the 
aggregation  function.  For  example,  writing  Zj  =  Uj{xj),  where  Xj  is  a  value  for  criterion  j,  then 
F{zi, . . . ,  Zn)  =  Yli  for  some  set  of  coefficients  au,  corresponds  to  aggregation  by  a  linear 
weighted  sum. 

A  weighted  sum  cannot  express  interaction  between  criteria;  it  assumes  that  all  criteria  are 
preferentially  independent  [19].  An  aggregation  function  that  avoids  this  assumption  and  that 
satisfies  certain  desirable  properties  is  the  Choquet  integral  [12,  21].  The  Choquet  integral 
subsumes  a  weighted  sum,  and  is  able  to  express  multi-criteria  trade-offs  such  as  Pareto  optimal 
decisions  that  a  weighted  sum  cannot  represent.  We  now  explain  how  we  define  a  scheduling 
preference  model  based  on  this  representation. 

From  our  first  user  study  reported  above,  augmented  by  features  suggested  in  prior  work  in 
the  literature  (e.g.,  [20,  25,  3]),  we  identified  seven  criteria  consistent  across  different  users:  (1) 
scheduling  windows  for  the  requested  meetings;  (2)  durations  of  meetings;  (3)  overlaps,  ordering 
constraints,  and  conflicts  between  requested  and  existing  meetings;  (4)  locations  of  meetings;  (5) 
participants  in  meetings;  (6)  time  or  duration  changes  for  existing  meetings;  and  (7)  preferences 
of  others  participating  in  new  meetings  or  rescheduled  existing  meetings  [8]. 
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We  deliberately  ehose  eriteria  expressed  in  terms  of  eoneepts  familiar  to  the  user  in  the  do¬ 
main  to  faeilitate  elieitation  of  instanees  of  the  preferenee  model  and  explanation  of  learned 
preferenees.  Below  we  deseribe  the  formulation  of  the  eommensurate  utility  funetions  Ui.  Other 
eriteria,  ineluding  the  stability  of  a  eandidate  sehedule  (how  stable  it  is  with  respeet  to  new 
meetings  and  meetings  that  run  long)  and  how  mueh  the  eandidate  sehedule  perturbs  the  exist¬ 
ing  sehedule,  would  add  riehness  to  the  preferenee  model.  However,  as  we  will  deseribe,  we 
found  that  the  rieher  model  would  not  be  amenable  to  eonstraint  solving. 

With  the  eriteria  speeified,  we  next  define  the  Choquet  integral  aggregation  funetion  simplified 
to  this  eontext. 

Definition  1.  Let  Zi  =  Ui{xi),  let  N  be  the  set  of  criteria,  and  let  I  U  N  be  a  subset  of  the 
criteria..  The  Choquet  integral  is 

F{zi,...,Zn)  ='^ai /\zj  (1) 

ICN  j&I 

where  aj  is  a  coefficient  representing  the  degree  and  type  of  interaction  of  the  criteria  in  I,  and 
where  A  denotes  conjunction.  □ 

Over  n  eriteria,  speeifieation  of  the  general  Choquet  integral  requires  2”  eoeffieients.  A  k- 
additive  or  k-order  Choquet  integral  eonsiders  only  the  interaetions  of  eriteria  sets  with  k  or 
fewer  eriteria.  It  trades  expressiveness  of  the  model  for  its  easier  speeifieation.  Praetieal  ap- 
plieations  indieate  that  the  2-order  ease  is  usually  suffieient  [24,  22].  Only  n  -f 
eoeffieients  are  required  to  speeify  the  integral.  In  this  ease,  (1)  ean  be  written  as 

F{zi,  •  .  .  ,  Zn)  —  ^  ^  CLiZi  -|-  ^  ^  O'iji.Zi  A  zf  (2) 

i^N 

where  the  eoeffieients  a,,  i  ^  N,  and  Oij,  {i,  j}  C  N,  fully  speeify  the  model.  The  eonjunetion 
operator  A  ean  be  speeialized  to  minimum.  Details  of  the  derivation  are  given  in  [24] . 

In  the  2-order  ease,  the  eoeffieient  Oi  G  [0, 1]  deseribes  the  relative  importanee  of  eriterion  i  (a 
greater  value  indieates  greater  relative  importanee),  while  aij  G  [—1,1]  deseribes  the  interaetion 
between  eriteria  i  and  j.  aij  >  0  indieates  that  i  and  j  are  eomplementary  eriteria,  while  <  0 
indieates  that  they  are  substitutive;  aij  =  0  indieates  no  eorrelation  between  the  two. 

We  make  the  hypothesis  that  a  2-order  model  trades  off  expressiveness  (being  unable  to  ex¬ 
press  interaetion  effeets  among  three  or  more  eriteria)  with  ease  of  model  speeifieation  and  ex¬ 
planation  in  a  way  suitable  for  (1)  a  eonstraint-based  representation  of  the  seheduling  problem, 
(2)  reasoning  over  this  problem  representation,  and  (3)  learning  revisions  to  the  model.  With 
n  =  7  eriteria,  an  instanee  of  the  model  is  speeified  by  27(7  -|-  1)  =  28  eoeffieients.  Note,  how¬ 
ever,  that  the  reasoning  and  learning  deseribed  in  the  sequel  do  not  depend  on  a  2-order  model, 
other  than  their  eomplexity  inereases  as  the  number  of  Choquet  eoeffieients  inereases. 
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3  PREFERENCE  AND  PROBLEM  ELICITATION 

Both  preference  and  problem  elicitation  share  the  common  challenges  of  eliciting  preference 
information  from  a  user.  On  one  hand,  information  that  the  user  does  know  and  can  express  she 
may  not  be  willing  to  input  to  the  system,  especially  if  the  elicitation  process  is  burdensome.  On 
the  other  hand,  the  user  may  not  fully  know  the  information  or  be  able  to  express  it  (at  least  via 
the  elicitation  mechanism  offered).  Moreover,  the  very  process  of  elicitation  itself  is  known  to 
reshape  —  and  at  worst  bias  —  the  information  provided;  preferences  can  even  be  created  by 
the  elicitation  process  [33,  41]. 

Preference  elicitation  One  approach  to  elicitation  of  general  scheduling  preferences  is  to 
derive  rankings  of  schedule  options  by  showing  the  user  specific  examples  as  part  of  an  elicita¬ 
tion  phase.  groupTime  [3]  is  a  scheduling  system  that  takes  such  an  example-driven  approach. 
Viappiani  et  al.  [41]  show  that  presenting  users  with  carefully  chosen  examples  can  help  stimu¬ 
late  their  preferences,  a  technique  known  as  example  critiquing. 

An  alternative  approach  to  elicitation  has  the  user  enter  the  parameters  of  the  preference  model 
directly.  The  scheduling  system  of  Hayes  et  al.  [13]  takes  such  a  model-driven  approach.  The 
advantage  of  this  approach  is  that  the  elicitation  period  can  be  shorter,  because  the  information 
elicited  has  greater  entropy.  The  significant  disadvantage  is  that  users  must  comprehend  the 
model,  rather  than  examples  that  are  possibly  easier  to  comprehend. 

The  coefficients  of  a  Choquet  integral,  the  goal  of  our  elicitation,  can  be  derived  from  state¬ 
ments  over  examples  {“1  rate  this  example  more  highly  than  that”)  or  over  the  model  (“overlap 
is  more  important  for  me  than  duration”)  [24]. 

We  chose  to  explore  a  form  of  model-driven  preference  specification  for  three  reasons.  First, 
we  considered  that  the  user’s  intuition  of  the  scheduling  domain  reduces  the  cognitive  effort  to 
understand  the  model,  in  contrast  to  the  need  to  digest  an  artificial  example  before  giving  mean¬ 
ingful  feedback  based  upon  if.  This  is  especially  frue  for  over-consfrained  examples,  where  a 
perception  of  fhe  calendars  and  preferences  of  involved  parficipanfs  is  necessary  fo  make  an  in¬ 
formed  judgmenf  over  scheduling  opfions.  Second,  fhe  use  of  online  learning  in  PTIME  updafes 
fhe  model  from  examples  (real-life  examples  for  which  fhe  user  already  has  in  mind  fhe  meefing 
requesf,  and  parficipanfs’  calendars  and  preferences,  fo  some  degree).  Third,  we  considered  if 
advanfageous  fo  enable  fhe  user  fo  employ  PTIME  wifhouf  an  exfended  elicifafion  phase,  by 
developing  a  one-shof,  model-driven  elicifafion. 

Hence,  our  approach  fo  preference  elicifafion  combines  an  inferacfive,  visual  inlerface  and 
a  series  of  simple  elicifafion  invitations.  Each  invifafion  is  presenfed  as  a  panel,  such  as  in 
Eigure  2,  fhaf  asks  fhe  user  fo  provide  some  sfafemenfs  of  her  general  scheduling  preferences. 
The  panels  are  presenfed  in  a  ‘wizard’  inferface  fhaf  allows  fhe  user  fo  freely  step  forward  and 
backward  fhrough  fhem,  and  fo  ignore  any  invifafion  (i.e.,  provide  no  information  for  if). 

The  firsf  panel  invifes  fhe  user  fo  describe  her  general  femporal  preferences  over  a  week,  by 
painting  fhem  onto  a  calendar  view  ‘canvas’.  This  direcf  elicifafion  of  femporal  preferences 
follows  prior  calendaring  agenfs,  such  as  [13]  and  contemporary  work,  such  as  [3]. 

Erom  fhe  informalion  entered  by  fhe  user,  we  infer  qualifafive  preference  sfafemenfs  on  and 
befween  fhe  criferia,  and  compile  fhese  sfafemenfs  info  quanfifafive  coefficienfs  in  fhe  Choquef 
represenfafion.  This  compilafion  is  performed  by  solving  a  linear  program  (EP);  for  fhe  defails 
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Figure  2:  A  single  panel  of  the  criteria  preference  elicitation  wizard.  Users  are  asked  to  position 
each  criterion  to  reflect  its  importance  relative  to  both  other  criteria  and  qualitative 
importance  labels. 

we  refer  to  [24].  The  preference  statements  we  infer  encompass  information  regarding  both  the 
relative  importances  of  the  criteria  (i.e.,  the  Oj  Choquet  coefficients),  and  the  trade-offs  between 
criteria  pairs  (i.e.,  the  aij  coefficients). 

In  the  panel  shown  in  Figure  2,  the  classification  of  the  criteria  into  the  four  buckets  speaks  to 
the  first  aspect  (a*),  and  the  pairwise  relative  positions  of  the  criteria  speak  to  the  second  aspect 
(aij).  For  instance,  from  the  position  of  the  overlap  criterion  in  the  ‘very  important’  bucket, 
the  system  will  infer  a  value  for  the  corresponding  03  coefficient  close  to  1.  From  the  relative 
positions  of  this  criterion  and  the  duration  criterion,  the  system  will  infer  a  value  for  the  023 
criterion  less  than  zero. 

This  two-stage  approach  of  interactive  elicitation  followed  by  compilation  is  designed  to  blend 
intuitive,  lightweight  elicitation  for  the  user,  in  qualitative,  domain-dependent  terms  familiar  to 
her,  with  the  eventual  derivation  of  quantitative  coefficients  required  by  our  model.  Moreover, 
the  process  is  robust  to  the  amount  of  information,  falling  back  to  an  uninformative,  default 
instantiation  of  the  model  in  the  case  of  no  information.^ 

It  is  well  known  (e.g.,  [19])  that  people  do  not  act  rationally  and  often  do  not  have  consis¬ 
tent  desires.  Thus,  the  preference  statements  made  may  result  in  conflicts.  In  these  cases,  we 
iteratively  relax  or  remove  conflicting  statements  until  the  resulting  LP  is  consistent.^ 

'Equal  importance  (m  =  1/n)  to  each  criterion;  no  interaction  (aij  —  0)  between  each  pair  of  criteria. 

^We  judged  that  any  benefit  from  including  the  user  in  this  process  would  be  outweighed  by  the  burden  of  possible 
added  confusion.  Relaxation  of  inconsistent  Choquet  models  is  a  research  topic  beyond  our  scope  here  [23]. 
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Problem  elicitation  Like  preference  elicitation,  our  approach  to  problem  elicitation  is  based 
on  an  interactive  interface.  Common  office  calendaring  tools,  such  as  Microsoft  Outlook,  obtain 
meeting  details  by  allowing  the  user  to  select  a  block  of  time  on  a  calendar  interface,  and  then 
filling  in  details  in  a  form.  Recent  tools,  such  as  Google  Calendar  [11],  allow  the  user  to  specify 
the  meeting  details  by  entering  natural  language  (NL)  sentences.  In  both  cases,  such  tools  are  not 
seeking  to  elicit  information  in  order  to  formulate  and  solve  a  problem  of  presenting  schedule 
options,  but  simply  to  fix  a  meeting  in  the  user’s  calendar. 

The  information  for  our  scheduling  task  can  be  both  broader  {“An  afternoon  next  week’’)  and 
more  specific  ( “Bob  is  an  optional  participant”)  than  that  required  by  calendaring  tools.  We  are 
seeking  to  elicit  not  the  details  of  the  meeting  itself,  but  details  of  the  meeting  request  —  a  poten¬ 
tially  rich  set  of  soft  constraints  that  will  be  used  to  formulate  a  constrained  scheduling  problem 
instance.  Since  the  constraints  are  relaxable,  problem  elicitation  can  be  seen  as  elicitation  of 
problem-specific  (in  contrast  to  generic  scheduling)  preferences. 

Thus,  the  deficiencies  of  common  approaches  are  magnified.  On  one  hand,  form-based  elic¬ 
itation  varies  from  restrictive,  since  the  user  is  limited  to  the  fields  in  the  form,  to  intimidating, 
if  the  form  contains  enough  fields  to  allow  expression  of  rich  constraints.  Further,  users  are 
observed  to  feel  they  must  fill  in  a  field  simply  because  of  its  presence  [33].  On  the  other  hand, 
unrestricted  NL  can  leave  the  user  unsure  of  what  to  enter,  especially  once  the  illusion  that 
the  system  really  understands  everything  entered  is  (inevitably)  broken.  Moreover,  while  forms 
place  restrictions  or  requirements  on  the  user,  NL  provides  little  guidance:  for  example,  that  the 
user  can  specify  optional  participants. 

Figure  3  shows  our  problem  elicitation  interface.  In  the  top  section  of  the  window  we  see  a 
system-user  dialogue,  following  DiamondHelp  [37] .  In  the  bottom  section  of  the  window  we  see 
dynamic,  context-specific  content  —  here  the  meeting  request  interface.  Based  around  an  NL 
input  mechanism  (centre),  the  system  summarizes  the  information  the  user  has  entered  already 
(bottom),  and  stimulates  her  with  ‘example’  lines  (top).  The  design  choices  and  rationale  behind 
them  are  discussed  in  [2]. 

Because  of  the  NL  interface,  the  user  can  readily  specify  soft  constraints  (i.e.,  constraints  that 
may  be  satisfied  only  to  a  degree,  in  contrast  to  hard  constraints  that  must  be  satisfied  exactly), 
such  as  preferred  times  (e.g.,  “prefer  early”,  or  “prefer  3pm”),  and  optional  and  preferred  lo¬ 
cations  and  participants.  The  user  can  also  flexibly  and  succinctly  specify  time  windows  (e.g., 
“tues  afternoon  ”). 

Transparency  and  predictability  are  enhanced  by  the  system  reporting  what  it  understood: 
note  the  “You  entered:”  message  and  the  highlighted  fields,  which  provide  clarification  of  what 
the  system  understood.  This  can  be  contrasted  with  the  more  heavyweight  NL  clarifications  in 
RhaiCAL  [5].  Our  clarification  minimizes  the  cost  of  poor  guesses  by  the  system  (another  prin¬ 
ciple  of  Horvitz)  by  making  explicit  what  it  understands,  at  a  glance.  Further,  auto-completion 
removes  the  burden  of  typing  in  (and  possibly  misspelling)  locations,  participant  names,  and  so 
on.  This  lessens  another  characteristic  problem  of  a  natural  language  modality. 

The  NL  interface  provides  a  mechanism  for  specifying  preferences  that  is  designed  to  be  supe¬ 
rior  to  a  form-based  direct  manipulation  input  interface.  Nonetheless,  since  easy  access  to  direct 
manipulation  is  among  the  principles  for  mixed-initiative  user  interfaces  [16],  and  recognizing 
that  some  users  find  a  form-based  interface  more  familiar,  we  provide  an  option  to  “Switch  to 
Form  View”  that  allows  direct  input  of  a  subset  of  the  possible  constraints. 
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Schedule  a  meeting 


Please  specify  input  parameters  for  schedule  meeting; 


Enter  meeting  details: 
e  g.  Seminar,  for  1.5  hours 

with  Pauline  Berry  and  Mei  Marker 


Switch  to  Form  View 


with  Paul 


Add  Details 


^  You  entered:  mtg  re 'Testing"  tues  afternoon,  prefer  early 

Current  Meeting  Details 

•  Summary:  Testing  ^ 

.  schedule  between:  and  6pm,  Tue,  ^  q 

prefer  early 

Defaults 

-■  Duration;  1  hour  ^  O 
■  Location;  My  office  ^  O 

•  Importance;  normal  ^  O 


Submit 


Figure  3:  Meeting  request  (problem)  elicitation  interface.  The  “You  entered:”  line  displays  the 
previously  entered  text,  while  the  “Current  Meeting  Details”  show  how  the  system 
parsed  the  text. 

Our  interface  is  suited  to  specification  of  straightforward  temporal  preferences  for  the  meet¬ 
ing,  such  as  a  combined  preference  for  the  morning  over  the  afternoon,  and  later  in  the  morning 
over  earlier.  It  is  less  suited  for  specification  of  finely  grained  preferences,  such  as  “later  be¬ 
tween  10— Ham,  or  as  soon  as  possible  after  noon,  but  not  between  ll-ll:30am” .  As  fhe 
lasf  senfence  exemplifies,  such  highly  defailed  preferences  are  cumbersome  fo  describe  in  fexf. 
Rafher,  a  dedicafed  direcf-manipulafion  interface  for  such  preference  would  be  preferable  [6]. 
We  have  nof  implemenfed  fhis  complemenfary  kind  of  inlerface,  because  our  firsl  user  sfudies 
indicated  fhaf  meefings  are  rarely  requesfed  wifh  such  defailed  preferences.  We  fhink  if  belter 
for  fhe  user  lo  refain  Ihese  preferences  and  use  Ihem  lo  guide  among  relaxafions  of  fhe  meeling 
requesl,  if  warranled,  when  fhe  schedule  opfions  are  presented  by  fhe  syslem. 

Fundamenfally,  fhe  scheduling  domain  scopes  fhe  sfalemenfs  fhe  user  wishes  lo  inpul  so  fhaf 
reslricled  NL  is  praclicable,  and  direcls  fhe  sfalemenfs,  so  fhaf  if  is  inluilive.  Allogelher,  fhe 
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assisted  NL  interface  was  favourably  received  by  users  in  the  restricted  setting  of  our  calendaring 
domain,  relative  to  the  direct  manipulation  interface.  Our  user  studies  are  reported  below  in 
Section  6. 

In  the  background,  while  the  user  is  specifying  the  meeting  request,  the  system  retrieves  the 
schedule  of  the  user  and  the  free/busy  times  of  all  specified  rooms  and  participants.  Lightweight 
constraint  reasoning  is  performed  after  each  change  to  detect  conflicts  and  display  them  with 
suggestions  for  removing  them.  The  calendaring  domains  allows  specialized  partial  satisfaction 
and  explanation  mechanisms  [18]. 

We  provide  an  option  for  the  user  to  display  her  current  calendar.  The  calendar  display  in¬ 
cludes  options  to  overlay  the  day/time  preferences  and  free/busy  availability  of  participants  re¬ 
quested  in  the  meeting  so  far.  Combined,  the  conflict  display  and  repair  suggestions,  and  the 
calendar  and  preference  display,  inform  the  user  as  she  composes  her  meeting  request. 
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4  CONSTRAINT  REASONING 

The  aim  of  constraint  reasoning  is  to  generate  candidate  options  in  response  to  a  scheduling 
problem  instance  that  results  from  problem  elicitation  (recall  Figure  1).  This  reasoning  must 
account  for  the  preference  model  of  the  user,  the  constraints  and  preferences  in  the  scheduling 
problem,  and  both  the  current  schedule  and  day/time  preferences  for  all  involved  participants. 

To  find  desirable  solutions  according  to  our  chosen  preference  model,  the  PTIME  Constraint 
Reasoner  must  search  for  candidate  schedules  using  the  Choquet  integral  as  the  objective  func¬ 
tion.  Thus,  given  our  preference  model  and  the  state  of  the  art  in  constraint-based  temporal 
reasoning,  we  have  two  questions  to  address:  (1)  how  to  represent  the  hard  and  soft  temporal 
constraints  in  a  form  that  translates  into  the  criteria  we  have  chosen;  and  (2)  how  to  extend  the 
temporal  reasoning  beyond  the  simple  objective  functions  found  in  the  literature  to  the  more 
complicated  Choquet  integral. 

Both  questions  arise  from  the  core  issue  of  this  work:  balancing  the  need  for  an  expressive 
preference  model  with  difficulty  of  learning  and  reasoning  about  it.  In  previous  quantitative 
constraint-based  temporal  optimization  work,  objective  functions  were  based  on  simple  aggre¬ 
gations  of  local  preference  values  —  values  indicating  how  well  each  constraint  is  satisfied  (e.g., 
[35]).  These  objectives  do  nof  easily  exfend  fo  encompass  a  preference  model  based  on  a  Cho- 
quef  integral.  Firsf,  our  model  is  based  on  absfracl  (domain-relevanf)  criferia,  nof  individual 
consfrainfs.  Second,  fhe  model  confains  inferacfions  befween  criferia,  somefhing  nof  express¬ 
ible  in  exisfing  frameworks,  even  for  fhe  case  in  which  each  consfrainf  is  considered  a  disfincf 
criterion. 

Our  reasoning  approach  has  fwo  facels  fhaf  correspond  fo  fhe  fwo  questions  above.  Firsf,  we 
map  each  consfrainf  fo  a  subsef  of  fhe  criferia,  based  on  fhe  origin  of  each  consfrainf.  For  exam¬ 
ple,  a  sofl  consfrainf  expressing  allowed  durafions  maps  fo  fhe  duration  criterion,  and  a  constraint 
expressing  that  a  person  cannot  attend  two  meetings  at  once  maps  to  the  overlap  criterion.  The 
mapping  is  one  to  many:  a  constraint  can  map  to  multiple  criteria.^ 

To  give  a  sense  of  how  we  represent  the  soft  constraints  in  the  meeting  problem.  Figure  4 
depicts  a  constraint,  and  preference  functions,  for  an  overlap  constraint  between  an  existing 
meeting  {E)  and  a  new  meeting  (M)  that  both  involve  a  common  participant.  In  a  hard  (i.e., 
must  be  satisfied  exacfly)  version  of  fhis  consfrainf,  eifher  fhe  sfarf  of  fhe  exisfing  meefing  musf 
occur  affer  fhe  new  meefing  or  fhe  sfarf  of  fhe  new  meefing  affer  fhe  exisfing  meefing.  In  fhe 
depicfed  sofl  version,  fhe  preference  functions  allow  for  fhe  evenfs  fo  overlap  buf  specify  fhaf  a 
smaller  is  preferred. 

Formally,  we  model  fhe  consfrainfs  as  a  Disjunctive  Temporal  Problem  with  Preferences 
(DTPP)  [35].  Adding  fhe  mapping  from  consfrainfs  fo  criferia  Iransforms  fhe  DTPP  info  a 
Multi-Criteria  DTPP  (MC-DTPP)  [27].  To  optimally  solve  an  MC-DTPP,  a  solution  musf  be 
found  fhaf  maximizes  fhe  value  of  fhe  Choquef  integral  equafion  (2).  To  opfimize  over  fhe  full 
infegral  (which,  recall,  is  a  sum  of  as  many  as  2*^  terms)  would  be  exfremely  challenging.  Every 
search  poinf  would  require  significanl  time  fo  calculafe  fhe  objecfive  value  and,  more  imporfanf, 
if  would  be  difficull  fo  develop  effeclive  heurisfics  or  pruning  sfrafegies. 


^An  alternate  and  equivalent  view  is  that  a  single  criterion  maps  to  many  constraints.  Viewed  either  way,  there  is 
similarity  with  GAI  preference  models  [7],  in  which  overlapping  subsets  of  attributes  compose  each  factor. 
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Overlap  constraint  for  new  (M)  and  existing  meeting  (E) 
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Figure  4:  A  DTPP  constraint  that  represents  the  soft  overlap  between  an  existing  meeting  E  and 
a  new  meeting  M  involving  the  same  person. 


This  difficulty  highlights  the  trade-off  between  model  expressiveness  and  tractable  reasoning. 
It  motivates  our  choice  of  the  2-order  Choquet  integral,  which  captures  at  least  one  level  of 
criteria  interaction,  but  can  be  represented  using  the  sum  of  only  ^n(n  +  1)  terms.  It  also  guides 
our  choice  of  criteria:  criteria  such  as  stability  and  perturbation  mentioned  earlier  are  functions 
of  an  entire  schedule,  and  thus  do  not  easily  incorporate  with  the  standard  DTPP  scheme  of 
aggregating  over  local  preference  values. 

The  second  facet  of  our  reasoning  is  thus  an  effective  algorithm  for  solving  an  MC-DTPP 
to  obtain  the  scheduling  options.  The  algorithm  augments  a  leading  branch-and-bound  DTPP 
solver  [28]  with  special  bounding  logic  and  additional  heuristics.  We  leave  the  details  to  [27]. 
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5  SCHEDULE  PRESENTATION  AND  LEARNING 

From  the  many  solutions  generated  by  the  Constraint  Reasoner  we  want  to  present  a  subset  — 
the  eandidate  seheduling  options  —  to  the  user.  On  one  hand,  the  system  must  not  overwhelm 
the  user  with  too  many  options,  but  on  the  other  hand,  it  must  present  enough  options  so  that  the 
user  ean  seleet  one  that  is  aeeeptable.  In  general,  we  want  to  present  the  most  desirable  solutions 
first  (aeeording  to  the  elieited  preferenee  model).  However,  the  model  will  generally  be  an 
imperfeet  approximation  of  the  user’s  “true”  preferenees,  so  we  also  want  to  present  additional 
solutions  that  are  not  neeessarily  rated  highly  but  that  are  qualitatively  different  from  those 
thought  most  preferred  aeeording  to  the  eurrent  model  instantiation.  This  approaeh  presents 
desirable  solutions  and  at  the  same  time  also  enables  the  user  to  explore  the  solution  spaee. 
With  eaeh  option  that  relaxes  the  meeting  request,  we  provide  an  explanation  in  the  form  of  a 
simplified,  nafural  language  summary  of  fhe  relaxed  eonsfrainfs. 

While  fhe  elieifed  preferenees  provide  a  sfarfing  poinf  for  presenfing  solutions  failored  fo  fhe 
user,  an  effeelive  seheduling  assisfanf  should  refine  ifs  preferenee  model  over  lime.  Thus,  we 
employ  maehine  learning  leehniques.  If  would  be  disruptive  lo  fhe  user  experienee  lo  inlerrupl 
wifh  learning  questions;  Iherefore,  in  a  deployed  selling,  learning  musf  happen  online  using  only 
feedbaek  oblained  fhrough  a  user’s  nafural  inleraelion  wifh  fhe  sysfem  —  i.e.,  Ihrough  fhe  user’s 
seleelions  from  among  fhe  eandidales  presented;  Ihese  are  fhe  Iraining  examples  for  fhe  learner. 

Reeall  lhal  in  our  earlier  work,  fhe  PTIME  Preferenee  Learner  aequired  user  preferenees  over 
temporal  sehedule  fealures  (i.e.,  day/lime  preferenees)  using  support  vector  machine  (SVM) 
learning  leehniques  for  ranking  [17].  The  learner  aequired  fhe  weighls  of  a  linear  sehedule 
evalualion  funelion  lhal  eould  be  used  lo  rank  eandidate  sehedules  [9].^ 

Our  shift  lo  mulli-erileria  sehedule  evaluation  fundamenlally  ehanges  Ihe  learning  lask.  While 
Ihe  lask  remains  lo  learn  a  sehedule  evaluation  funelion,  Ihe  fealures  of  Ihe  funelion  are  no 
longer  boolean  fealures  representing  temporal  properties  of  a  sehedule,  bul  instead  are  real¬ 
valued  fealures  representing  Ihe  degree  of  salisfaelion  of  higher-level  erileria  Ui,  eaeh  of  whieh 
is  ilself  a  funelion  of  lower-level  sehedule  fealures,  as  deseribed  earlier.  Moreover,  Ihe  funelion 
being  learned  is  a  2-order  Choquel  integral,  ralher  lhan  a  linear  weighted  sum.  This  adds  Ihe 
eonslrainls  lhal  Ihe  eoeflieienls  obey  Oj  G  [0, 1]  and  aij  G  [—1, 1],  as  mandated  by  Ihe  Choquel 
model  [21,  24]. 

Observe  lhal  Ihe  2-order  Choquel  integral  ean  be  viewed  as  a  linear  weighted  sum  of  a  new 
sel  of  fealures  (erileria)  eomprising  affine  eombinalions  of  Ihe  imporlanee  eoeflieienls  a*  and 
inleraelion  eoeffieienls  aij.  Based  on  Ibis  linearization,  we  ean  apply  Ihe  same  SVM  learning 
leehniques  as  earlier  lo  learn  Ihe  eoeffieienls  (weighls)  for  Ihe  resulting  Iransformed  funelion, 
subjeel  lo  Ihe  eonslrainls  on  Ihe  values  of  Ihe  eoeffieienls.  Beeause  Ihe  SVM  learning  problem 
is  essentially  a  quadralie  optimization  problem,  lo  ensure  lhal  Ihe  eoeffieienls  learned  are  valid 
Choquel  eoeffieienls,  we  eould  augmenl  Ihe  quadralie  programming  problem  wilh  additional 
eonslrainls.  However,  simple  resealing  of  Ihe  learned  weighls  into  Ihe  range  [—1,1]  is  sufli- 
eienl  for  satisfying  Ihe  eonslrainl  on  Ihe  inleraelion  eoeflieienls,  and  we  found  lhal  Ihe  SVM 
learner  nalurally  learns  positive  weighls  for  Ihe  imporlanee  erileria  jusl  from  Ihe  Iraining  exam- 

that  work,  our  initial  focus  was  on  learning  preferences  in  under-constrained  situations,  although  the  same 
approach  could  have  been  used  in  principle  to  learn  preferences  in  the  more  difficult  over-constrained  situations 
as  well,  by  adding  features  to  represent  the  degree  of  satisfaction  of  the  different  scheduling  constraints. 
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pies  themselves.  Thus,  in  the  end,  no  modification  of  the  input  to  the  SVM  learning  algorithm 
proved  necessary.^ 

Since  both  the  elicited  and  learned  preferences  have  the  same  functional  form,  we  use  a  simple 
weighting  scheme  to  combine  them  into  the  refined  schedule  evaluation  function 

F'{Z)  =  axA-Z  +  {l-a)xW-Z  (3) 

where  A  is  fhe  vector  of  coefficienls  (weighfs)  for  Z  represenfing  fhe  elicifed  preferences  and 
W  are  fhe  learned  weighfs  for  Z.  By  varying  a  we  can  vary  fhe  relafive  influence  fhe  inifial  and 
learned  preferences  have  on  fhe  evaluafion  of  a  candidafe  schedule.  By  decaying  a  over  time, 
we  can  dynamically  modify  Ibis  relafive  weighting  fo  give  more  consideration  fo  fhe  learned 
weighfs  as  fhe  learner  views  more  fraining  examples,  i.e.,  as  fhe  user  employs  fhe  sysfem. 

Our  conversion  of  fhe  Choquef  model  fo  a  linear  function  over  an  expanded  sef  of  fealures 
ensures  fhaf  we  can  confinue  fo  use  a  linear  kernel.  One  advanfage  of  linear  kernels  is  fhaf  fhe 
resulfing  learned  function  is  more  easily  inferprefed.  (An  alfernafive  to  fhe  feafure  expansion 
approach  would  be  fo  change  kernel.)  However,  fhe  feafure  expansion  does  lose  fhe  connec¬ 
tion  befween  fhe  imporfance  criteria  a*  and  fhe  inferacfion  criteria  aij :  as  far  as  fhe  learner  is 
concerned,  if  is  learning  relafive  weighfs  for  independenf  feafures.  Thus,  fhere  is  no  guarantee 
fhaf  fhe  resulfing  learned  function  will  be  a  well-formed  Choquef  infegral  and,  in  facf,  we  can- 
nof  converf  fhe  learned  weighfs  back  fo  valid  Choquef  coefficienfs.  However,  fhe  MC-DTPP 
solving  algorifhm  discussed  in  fhe  previous  secfion  does  nol  rely  on  a  valid  Choquef  form  buf 
can  accommodafe  a  general  MAUT  objecfive  funclion.  Hence,  provided  fhaf  our  learned  func¬ 
tion  successfully  capfures  user  preferences,  we  can  generate  and  presenf  desirable  scheduling 
options. 

If  is  worfh  emphasizing  fhaf,  after  fhe  Iransformalion  of  fhe  model,  we  work  entirely  wifh  fhe 
linearized  model.  Bofh  fhe  consfrainf  solving  and  fhe  online  learning  work  wifh  fhe  new  repre- 
senfafion.  Thus  fhe  loss  of  connection  befween  fhe  Oj  and  Uij  does  nof  hinder  fhe  compufafion 
over  nor  refinemenf  of  fhe  model.  However,  if  does  presenf  a  greater  challenge  in  explaining  fhe 
currenf  sfafe  of  fhe  model  insfance  fo  fhe  user,  since  fechniques  for  explaining  Choquef  models 
are  based  on  fhe  original,  unfransformed  coefficienls.  Providing  a  user-infelligible  explanalion 
of  fhe  currenf  learned  model  is  pari  of  our  ongoing  work. 


^In  the  few  cases  where  the  coefficients  for  the  importance  criteria  are  negative,  they  are  very  small  and  simply 
bringing  them  up  to  zero  or  a  small  positive  number  did  not  affect  learning  performance. 
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6  VALIDATION 

To  assess  the  value  of  our  approach  to  the  trade-off  between  representation,  reasoning,  and  learn¬ 
ing,  we  designed  experiments  to  answer  the  following  questions:  (1)  will  the  learning  algorithm 
converge  to  a  given  preference  model  given  consistent  feedback;  (2)  how  well  do  users’  elicited 
preferences  match  their  “true”  preferences;  (3)  how  well  does  the  Preference  Learner  perform 
using  feedback  from  real  users  on  interesting  scheduling  instances;  (4)  does  the  Constraint  Rea- 
soner  provide  optimal  solutions  in  an  adequate  amount  of  time;  and  (5)  how  well  does  the  entire 
system  perform  in  real-world  situations? 

6.1  Preliminary  Evaluation  of  Learning  Algorithm 

To  validate  the  learner  and  to  help  prepare  for  deployment,  we  ran  a  set  of  experiments  using 
synthetic  data  —  both  randomly  generated  target  functions  (converted  Choquet  integral  coeffi¬ 
cients)  as  well  as  randomly  generated  schedules  (feature  vectors  Z).  Each  experiment  simulated 
an  online  learning  situation  where,  at  each  iteration,  the  learned  model  ranked  a  set  of  candidate 
schedules,  the  target  model  selected  the  best  schedule,  and  the  learned  model  updated  with  this 
feedback  in  preparation  for  the  next  iteration.  Based  on  the  evaluation  metrics  discussed  in  the 
next  sections,  the  results  of  these  experiments  confirm  fhaf  fhe  SVM  learner  can  successfully 
learn  fhe  largel  functions  in  Ibis  selling,  oflen  wilhin  10  lo  20  scheduling  episodes,  each  involv¬ 
ing  25  candidafes.  As  fhe  number  of  candidales  is  reduced,  however,  fhe  number  of  ilerafions 
required  for  convergence  greally  increases.  Figure  5  shows  fhe  rale  of  convergence  for  5  and  25 
candidales  per  iteration. 

To  lesl  Ihe  effecls  of  Ihe  qualify  of  elicited  (initial)  preferences  on  performance,  we  varied 
how  well  Ihe  initial  preferences  malched  Ihe  Irue  largel  model  and  Ihen  analyzed  Ihe  effecls 
of  differenl  initial  weighls  and  decay  slralegies.  The  resulls  suggesl  lhal  instead  of  using  a 
fixed  weighting  scheme  belween  Ihe  elicited  and  Ihe  learned  preferences,  we  should  consider 
Ihe  qualify  of  Ihe  elicited  preferences  in  determining  bolh  Ihe  initial  weighl  lo  give  Ihem  and 
how  aggressively  lo  decay  lhal  weighl  over  time.  Furlher  experimenls,  including  Ihose  reported 
below,  led  us  lo  choose  an  exponential  decay. 

6.2  Evaluation  of  Preference  Model  and  Elicitation 

To  evaluate  whelher  our  preference  model  was  able  lo  caplure  Ihe  essential  aspecls  of  an  indi¬ 
vidual’s  scheduling  preferences,  and  whelher  our  elicilalion  interface  was  adequate  for  users  lo 
initially  express  Iheir  preferences,  we  conducted  a  user  sludy  consisting  of  Iwo  parls:  a  session 
of  preference  elicilalion  wilh  Ihe  interface  described  in  Ihe  previous  sections,  and  a  series  of 
simulated  meeting  requesls,  conducted  as  a  paper-based  exercise. 

We  asked  1 1  subjecls  —  wilh  roles  including  software  engineer,  researcher,  program  manager, 
and  adminislralive  slaff  —  lo  perform  a  brief  interaction  wilh  our  system  lo  elicil  an  inslanlialed 
model  of  Iheir  preferences.  Some  of  Ihese  subjecls  had  participated  in  our  user  habils  sludy 
reported  in  Section  2.1;  olhers  had  not  We  Ihen  gave  Ihe  subjecls  a  hypolhelical  Iwo-week 
calendar  of  a  knowledge  worker,  conlaining  a  moderate  number  of  meetings,  evenls,  and  dead¬ 
lines.  We  presented  Ihe  subjecls  a  series  of  meetings  lo  organize.  For  each  of  Ihe  11  meeting 
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Figure  5:  Spearman’s  correlation  coefficient  as  the  number  of  scheduling  instances  increases,  for 
5  and  25  candidates  per  instance. 

requests,  we  asked  each  subject  to  rank  10  options,  ties  permitted.  In  an  experimental  oversight, 
each  subject  was  presented  with  the  requests  in  the  same  order.  However,  our  later  repeat  of  a 
subset  of  the  experiment  causes  us  to  believe  that  the  order  of  the  meeting  requests  did  not  have 
significant  impact  on  the  options  selected  for  each  request  by  the  subjects. 

In  contrast  to  the  experiment  in  the  previous  section,  the  aim  of  this  study  was  to  investi¬ 
gate  the  extent  to  which  elicited  (initial)  preferences  correlated  with  actual  (“true”)  preferences, 
which  are  unknown.  We  based  a  measure  of  the  subjects’  actual  preference  on  the  ranking  of 
choices  the  subjects  made  in  each  presented  meeting  request.  To  quantify  the  degree  of  similarity 
between  the  elicited  and  the  actual  preferences,  we  used  the  Spearman’s  correlation  coefficient 
ip  G  [—1, 1])  [15]  between  the  rankings  provided  by  the  subjects  and  the  rankings  induced  by 
the  elicited  models.  A  p  value  of  1.0  indicates  two  identical  rankings,  while  p  =  —1.0  indicates 
complete  disagreement. 

Overall,  we  observed  a  weak  correlation  between  the  actual  and  modelled  preferences.  The 
arithmetic  mean  of  p  across  subjects  and  requests  is  0.26,  with  a  standard  deviation  of  0.32.  The 
variance  between  subjects  is  substantial:  the  mean  p  value  (across  the  subjects,  averaged  over 
the  requests)  varies  from  0.08  to  0.42. 

Figure  6  depicts  the  variance  of  p  for  each  meeting  request.  The  vertical  lines  show  the  range 
of  p  values  over  the  subjects,  and  the  boxes  indicate  one  quartile  from  the  mean.  Observe  that 
the  weakest  correlation  is  for  Request  3,  which  is  the  only  request  for  which  the  mean  p  is 
negative  (—0.02).  Request  3  (“All  day  travel  to  LA  office”)  is  a  situation  where  factors  outside 
those  included  in  our  preference  model  are  significant,  and  thus  correlation  may  be  expected 
to  be  poor.  By  contrast,  the  strongest  correlation  is  for  Request  6  (0.46),  which  is  a  situation 
with  delicate  trade-offs  between  time  and  participants,  the  type  of  situation  to  which  our  model 
is  specifically  dedicated.  These  facts  indicate  that  our  chosen  features  work  satisfactorily  for 
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Figure  6:  Spearman’s  correlation  coefficient  for  each  request  and  subject,  grouped  by  request. 


the  common  case.  We  could  add  features  to  address  the  poor  correlation  for  Request  3,  but  the 
additional  features  may  slow  the  rate  of  convergence  for  the  learning  and  will  make  constraint 
solving  more  difficult.  In  free-form  interviews  after  the  study,  subjects  expressed  satisfaction 
over  the  features  included  in  the  preference  model. 

The  least  variance  over  subjects  is  Request  1  (0.21  standard  deviation),  the  most  is  Request 
9  (0.48  standard  deviation).  Request  1  is  a  simple  situation  where  most  of  the  options  are 
favorable;  here,  if  the  elicited  model  correlates  well  with  the  true  preferences  of  one  subject,  it 
is  likely  to  do  so  for  all  subjects.  Request  9  is  distinguished  by  conflicts  among  the  preferences 
of  other  participants;  here,  the  subjects  assess  the  value  of  the  options  differently  according  to 
how  they  balance  the  importance  of  others’  preferences. 

To  obtain  a  baseline  for  these  Spearman  p  values,  we  compared  each  subject’s  elicited  ranking 
to  the  ranking  produced  with  two  other  means  of  instantiating  the  model:  a  default  instantiation 
that  corresponds  to  equal  weights  for  all  criteria,  and  a  random  (and  possibly  invalid)  instanti¬ 
ation  in  which  all  Choquet  coefficients  are  chosen  randomly.  This  exposes  the  extent  to  which 
the  correlation  is  due  to  the  adequacy  of  the  elicitation  as  opposed  to  the  form  of  the  model,  or 
any  other  variable. 

The  results  showed  a  mean  Spearman  value  (across  all  subjects  and  requests)  of  0.28  for  the 
default  and  0.27  for  the  random,  both  of  which  are  slightly  better  than  our  elicited  average;  the 
standard  deviation  was  0.30  and  0.31,  respectively  (about  the  same  as  for  elicited  initialization). 
If  we  separate  out  the  five  subjects  who  had  seen  and  used  the  interface  before  from  the  six  who 
had  not,  we  notice  a  substantial  spread  in  the  Spearman  averages:  the  experienced  users  had 
an  average  of  0.33  (higher  than  the  default),  whereas  the  inexperienced  users  had  an  average  of 
0.19. 
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We  might  attribute  this  result  to  the  elieitation  interfaee,  to  the  subjeets’  inability  to  eom- 
prehend  their  statements  in  terms  of  real-world  seenarios,  or  possibly  to  the  subjeets’  laek  of 
understanding  of  their  preferenees.  Informal  diseussions  supported  the  latter  hypothesis:  several 
subjeets  reported  that  the  proeess  of  ranking  sehedule  options  in  the  exereise  helped  them  real¬ 
ize  their  own  seheduling  preferenees,  in  partieular  over  the  trade-off  of  different  eriteria.  As  one 
subjeet  stated,  this  indieates  that  individuals  have  a  poor  grasp  of  their  aetual  preferenees  in  this 
area;  it  suggests  we  should  expeet  a  substantive  gap  in  eorrelation  between  elieited  and  aetual 
preferenees.  We  allowed  two  subjeets  to  restate  their  preferenees  using  our  elieitation  interfaee; 
both  restatements  resulted  in  a  small  inerease  in  the  Spearman  eorrelation  (by  0.02  and  0.06, 
respeetively),  whieh  agrees  with  the  results  we  found  with  the  experieneed  users  above. 

To  further  examine  the  varianee  in  the  results,  we  asked  five  of  the  subjeets  to  perform  a 
seeond  paper-based  exereise.  Unknown  to  these  subjeets,  the  seeond  exereise  was  equivalent 
to  the  first,  but  with  textual  deseriptions,  request  order,  and  sehedule  option  order  varied.  Of 
these  five  subjeefs,  fhe  mean  Spearman  eorrelafion  deereased  for  four,  and  inereased  for  fhe  lasf 
subjeef.  The  varianee  is  approximafely  unehanged  for  eaeh  subjeef.  Only  one  of  fhese  users  had  a 
signifieanl  deviafion  (—0.18),  allhough  Ihis  user  was  unique  given  lhal  he  inlerpreled  fhe  exereise 
mueh  differenlly  lhan  olhers  (he  did  nol  rank  any  sehedule  lhal  he  deemed  “unaeeeplable”). 
These  resulls  indieale  lhal  Ihere  is  a  moderafe  bul  nol  dominaling  perlurbalion  faelor  in  Ihe 
sludy  arising  from  Ihe  subjeefs’  ineonsisleney  in  Iheir  seheduling  deeisions. 

Overall,  Ihe  small  bul  posilive  eorrelafion  belween  Ihe  aelual  and  elieiled  preferenees  ean  be 
explained  by  several  faelors:  (1)  poverty  of  Ihe  ehosen  preferenee  model  (i.e.,  il  does  nol  inelude 
relevanl  fealures);  (2)  inadequaeies  in  Ihe  elieilalion;  (3)  laek  of  users’  awareness  of  Iheir  pref¬ 
erenees;  and  (4)  ineonsisleneies  in  user  seheduling  deeisions.  Based  on  user  feedbaek,  model 
expressiveness  is  only  found  laeking  in  silualions  sueh  as  Requesl  3,  exeeplional  eases  for  whieh 
Ihe  model  was  nol  designed.  Based  on  Ihe  differenee  belween  more  and  less  experieneed  sub- 
jeels,  (2)  and  (3)  have  signifieanee.  Based  on  Ihe  small  devialions  from  subjeefs  who  undertook 
Ihe  exereise  Iwiee,  (4)  is  found  to  have  a  small  impael.  Togelher,  (2),  (3),  and  (4)  poinl  to  Ihe 
benefil  of  revising  Ihe  inilial  model  by  learning,  supporting  our  overall  paradigm.  Laslly,  al¬ 
lhough  our  lighlweighl  model-based  elieilalion  has  advantages,  Ihe  heller  performanee  of  Ihe 
defaull  lhan  elieiled  model  instantiation  for  inexperieneed  users  suggesls  lhal  example-based 
elieitation  may  be  valued. 

6.3  Learning  Experiments  on  Static  User  Data 

The  above  sludy  provided  an  opporlunily  to  evaluate  learning  on  real  user  data.  We  performed 
a  leave-one-oul  eross-validalion  experimenl  on  Ibis  data  to  see  whelher  we  eould  measure  Ihe 
effeels  of  learning.  We  used  eaeh  requesl  in  lurn  as  Ihe  lesl  example  after  Iraining  on  Ihe  remain¬ 
ing  requesls,  and  Ihen  averaged  Ihe  resulls  over  all  Ihe  requesls.  To  evaluate  Ihe  effeels  of  Ihe 
elieited  preferenees  on  learning  performanee,  we  eompared  performanee  wilh  Ihe  defaull  and 
elieited  model  initialization.  For  Ihe  baseline  (eonlrol)  eondilions,  Ihe  Iraining  step  was  omilled, 
Ihus  measuring  Ihe  performanee  of  Ihe  initial  preferenees  only. 

Unforlunalely,  Ihe  amounl  of  data  available  in  Ihe  user  sludy  was  nol  suflieienl  to  illuslrale 
Ihe  effeels  of  our  learning  algorilhm. 

Whal  amounl  of  data  would  be  expeeled  to  determine  whelher  online  learning  has  an  effeel 
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(positive  or  negative)?  Figure  5  shows  that  seheduling  with  only  five  diverse  eandidates  requires 
well  over  20  instanees  to  begin  showing  signifieant  improvement  with  respeet  to  the  Spearman 
metrie.  Moreover,  in  real-world  seenarios  many  of  the  eandidates  are  very  similar  to  or  domi¬ 
nated  by  others.  Thus  in  Figure  5  even  five  eandidafes,  being  generafed  randomly,  are  equivalenf 
fo  many  more  eandidafes  in  a  real-world  selling;  henee  Ihe  number  of  inslanees  required  for  on¬ 
line  learning  lo  be  exhibiled  is  signifieanlly  larger  fhan  20.  In  eonlrasl,  Ihe  sludy  of  Seelion  6.2 
made  available  only  1 1  inslanees. 

6.4  Evaluation  of  Constraint  Reasoner  Performance 

One  requiremenl  for  our  model  was  lhal  if  be  simple  enough  lo  allow  Ihe  PTIME  Conslrainl 
Reasoner  fo  find  Ihe  lop  N  solulions  (N  =  35  in  our  ease)  in  a  reasonable  amounl  of  lime  for 
an  inleraelive  syslem.  For  seheduling  problems  involving  four  parfieipanfs,  four  meelings  per 
user,  and  a  Iwo-day  lime  period,  Ihe  average  lime  lo  find  Ihe  lop  35  solulions  is  0.4  seeonds.^ 
Inereasing  Ihe  number  of  parlieipanls  lo  six  and  Ihen  len,  Ihe  average  limes  are  4.3  seeonds  and 

14.4  seeonds,  respeelively.  A  lypieal  six-parlieipanl  problem  has  approximately  225  eonslrainls 
for  eaeh  meeling,  140  of  whieh  are  eilher  disjunelive  or  eonlained  preferenees  (non-disjunelive 
eonslrainls  wilhoul  preferenees  do  nol  need  lo  be  reasoned  aboul  during  seareh,  and  Iherefore 
have  negligible  impael  on  reasoning  lime). 

As  Ihe  problem  size  inereases,  Ihe  lime  lo  find  Ihe  oplimal  solulion  inereases  dramalieally. 
However,  our  MC-DTPP  algorilhm  slill  finds  high-qualily  solulions  for  Ihe  largesl  problems  we 
eneounler  (10  users,  20  meelings  per  user  in  Ihe  relevanl  lime  range).  We  ensure  a  responsive 
user  experienee  by  imposing  a  lime  limil  on  Ihe  reasoning,  reluming  as  Ihe  eandidale  sel  Ihe  besl 
of  Ihe  seheduling  oplions  found  in  lhal  lime.  The  eompulalion  is  anytime:  Ihe  system  eonlinues 
lo  seareh  for  furlher  options  as  Ihe  user  views  Ihose  already  found,  and  Ihe  user  ean  explieilly 
requesl  Ihe  system  lo  “Seareh  for  More”  options. 

6.5  Learning  Experiments  on  Dynamic  User  Data 

Our  final  sludy  evaluated  dala  eolleeled  from  Iwo  separate  seheduling  sessions.  The  inlenl  was 
Iwofold.  Firsl,  lo  investigate  parlieularly  Ihe  Ihird  and  fiflh  questions  posed  earlier:  learning 
performanee  using  feedbaek  from  real  users  on  interesting  seheduling  inslanees,  and  overall 
system  performanee  in  real-world  silualions.  Seeond,  lo  allempl  lo  eolleel  larger  dala  sels  lhan 
in  our  earlier  sludy  wilh  slalie  user  dala.  As  we  will  explain,  we  were  nol  sueeessful  in  Ihis 
seeond  aim,  and  Ihus  Ihe  firsl  aim  eould  only  be  partially  addressed. 

The  firsl  souree  of  dala  for  Ihis  endeavor  was  a  pilol  sludy  wilh  four  users,  lasting  for  Iwo 
hours.  Eaeh  subjeel  was  asked  lo  perform  Ihe  preferenee  elieilalion  step  and  Ihen  asked  lo 
sehedule  as  many  meetings  as  possible  wilhin  Ihe  remaining  time.  Prior  lo  Ihe  sludy,  Ihe  ealendar 
of  eaeh  user  was  populated  wilh  a  small  number  of  meetings  in  order  lo  inerease  Ihe  likelihood 
of  non-lrivial  seheduling  silualions. 

The  subjeels  were  asked  lo  sehedule  Ihe  meetings  as  Ihey  would  wilh  Iheir  own  ealendars. 
Speeifieally,  Ihey  were  lo  issue  meeting  requesls  lo  Ihe  system  like  Ihey  would  in  real  life,  and 
lo  ehoose  Ihe  sehedules  lhal  Ihey  normally  would.  This  resulted  in  Iwo  main  differenees  from 

®The  experiments  were  run  on  a  1.7GHz  Pentium  M  with  2GB  RAM. 
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the  previous  user  study:  (1)  the  seheduling  seenarios  were  eloser  to  the  individuals’  real-life 
situations,  and  (2)  not  all  the  seheduling  instanees  were  “interesting”  in  the  sense  of  exploring 
a  wide  variety  of  trade-offs  between  eriteria.  For  example,  if  a  user  enters  the  request  “meeting 
re:  ‘project  status  ’  next  Wednesday  with  Ken  Doran  ”  and  both  the  user  and  Ken  are  free  most  of 
next  Wednesday,  several  meeting  times  are  likely  to  be  equally  aeeeptable  to  the  user.  Not  only 
will  the  learned  model  already  be  likely  to  rank  an  aeeeptable  meeting  time  highly  (negative 
examples  provide  mueh  more  information  for  learning),  but  even  a  poor  model  would  have  a 
good  ehanee  of  ranking  an  aeeeptable  meeting  time  highly  as  well. 

The  faet  that  not  all  seheduling  situations  are  “interesting”  has  mixed  implieations  for  the  user 
experienee.  Sinee  our  Constraint  Reasoner  will  tend  to  produee  preferred  solutions  for  the  easy 
instanees  regardless  of  the  quality  of  the  learned  model,  the  user  will  get  the  impression  that 
the  system  understands  their  preferenees,  and  is  thus  likely  inerease  their  trust  in  it.  However, 
sinee  the  learner  gets  informative  data  only  a  fraetion  of  the  time,  the  system  will  be  slower 
to  eonverge  on  the  true  (and  possibly  ehanging)  preferenees.  In  this  short  user  study,  given  an 
average  of  ten  seheduling  requests  made  by  eaeh  partieipant,  only  two  or  three  proved  useful  to 
our  learner.  Furthermore,  although  up  to  35  solutions  are  presented  for  eaeh  seheduling  request, 
the  average  predieted  rank  of  the  user’s  ehoiee  was  2.54;  that  is,  an  aeeeptable  sehedule  was 
usually  in  the  first  three  presented.  Thus,  there  was  very  little  room  for  learning  to  improve 
performanee.  Indeed,  a  leave-one-out  eross-validation  experiment  run  on  the  data  eolleeted 
in  this  study  showed  no  elear  performanee  improvement  with  learning.  The  early  rank  of  the 
subjeets’  ehoiees  means  that  the  subjeets  may  have  simply  seleeted  the  first  aeeeptable  option, 
rather  than  the  “best”  one.  This  elaim  is  supported  by  [32],  whieh  elaims  that  seheduling  is  often 
more  about  “satisfieing”  instead  of  “optimizing”. 

The  data  for  the  seeond  part  of  our  study  with  dynamie  user  data  eame  from  a  week-long 
formal  evaluation  of  the  CALO  system.  Sixteen  subjeets  partieipated  in  the  evaluation.  Eaeh 
worked  with  a  CALO  agent  under  realistie  eonditions  for  a  Critieal  Learning  Period  (CLP)  that 
lasted  over  a  week.  Questions  were  administered  to  two  flavours  of  eaeh  CALO  afler  fhe  end  of 
fhe  CLP.  The  firsl  flavour,  LCALO,  made  use  of  fhe  knowledge  if  learned  during  fhe  CLP;  fhe 
seeond,  BCALO,  was  sfripped  of  sueh  knowledge. 

A  sef  of  parameterized  questions  were  insfanfialed  for  and  asked  of  eaeh  LCALO  and  BCALO 
pair.  The  seores  were  aggregafed  over  a  large  sef  of  sueh  quesfions  fhaf  were  used  fo  eover 
a  broad  range  of  eognifive  assisfanf  funefions.  Seoring  was  derived  by  eomparing  CALO’s 
answers  fo  fhe  insfanfialed  quesfions  wilh  answers  elieiled  from  fhe  subjeels  in  an  explieil  posl 
dala  eolleelion  phase.  The  experimenfal  design,  fhe  questions  and  Iheir  insfanlialions,  and  fhe 
seoring  were  eondueled  by  an  independenf  parly. 

Beeause  fhe  goal  of  fhe  formal  evaluation  was  fo  assess  fhe  learning  abilily  of  fhe  overall 
CALO  syslem,  fhe  subjeels  were  nol  exereising  PTIME  eonlinuously  (in  eonlrasl  fo  fhe  subjeels 
in  fhe  our  own  pilol  sludied  deseribed  above).  Despife  fhe  exfended  lime  frame  of  fhe  dala 
eolleelion  period  in  fhe  experimenf,  fhe  number  of  seheduling  insfanees  was  nol  subsfanlial. 
After  aeeounling  for  feehnieal  diffieullies  wilh  Ihe  system,  Ihe  dala  of  13  subjeels  was  usable. 
The  number  of  inslanees  per  subjeel  varied  belween  4  and  31;  Ihe  mean  number  of  inslanees 
was  13  and  Ihe  median  was  11. 

We  eompared  Ihe  answers  given  by  BCALO,  LCALO,  and  Ihe  users,  lo  Ihe  Ihree  inslanli- 
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ations  of  one  parameterized  question.  Eaeh  instantiated  question  asked  for  a  total  ranking  of 
five  eandidate  options  to  a  given  meeting  request.  BCALO  uses  the  default,  uninformative  in¬ 
stantiation  of  the  Choquet  model.  We  also  eompared  RCALO  (random  model  instantiation)  and 
ECALO  (instantiation  derived  from  the  user-elieited  preferenee  statements). 

Overall,  the  results  on  this  data  agree  with  the  results  from  the  data  of  our  earlier  smaller-seale 
study  of  Seetion  6.2.  Model  instantiation  from  elieitation  is  superior  (by  Spearman  eorrelation  of 
the  rankings,  in  response  to  the  question)  to  a  random  model  instantiation;  default  and  elieited 
model  instantiations  are  similar.  Online  learning  is  observed,  and  might  even  help  on  more 
oeeasions  than  it  hinders.  Throughout,  the  deltas  are  small;  at  least  to  a  large  part  this  ean  be 
attributed  again  to  the  modest  amount  of  data. 

The  true  test  of  our  balanee  between  elieitation,  learning,  and  eonstraint  reasoning  is  how 
well  our  system  performs  on  the  eomplieated  seheduling  requests  similar  to  those  given  in  our 
paper-based  study  —  the  diffieult  over-eonstrained  situations  for  whieh  PTIME  is  designed.  As 
the  environment  supporting  our  system  beeomes  more  stable,  we  will  begin  to  eolleet  data  over 
a  longer  term  and  evaluate  how  quiekly  our  system  learns  to  rank  sehedules  in  the  diffieult  eases. 
This  limited  study  did  reveal  that,  as  a  whole,  the  overall  PTIME  system  provides  a  positive  user 
experienee  even  in  eases  where  the  learning  has  no  effeet. 
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7  RELATED  WORK 

In  this  section  we  concentrate  on  related  work  in  automated  and  semi-automated  calendaring 
systems.  While  both  commercial  (e.g.,  Outlook/Outlook  Exchange  Server,  Sun  Calendar  Man¬ 
ager/Calendar  Server)  and  open  source  (e.g.,  Zimbra,  UWCalendar)  calendaring  systems  abound 
to  support  centralized  solutions  within  an  institution,  they  leave  the  task  of  choosing  a  meeting 
time  up  to  the  user,  thus  avoiding  the  need  to  actively  reason  about  user  preferences.  As  an 
advanced  example.  Meeting  Maker  [36]  supports  the  user  in  selecting  a  time  by  graphically  de¬ 
picting  the  availability  of  participants.  The  strength  of  this  and  other  group  calendaring  systems 
is  integration  with  institutional  workflow  (e.g.,  centralized  calendar  servers,  room  bookings);  it 
is  a  tool  rather  than  a  scheduling  assistant  like  PTIME. 

While  prior  academic  research  has  looked  at  one  or  more  of  the  three  aspects  of  modelling 
and  eliciting,  learning,  and  reasoning,  the  resulting  systems  have  rarely  sought  to  encompass  all 
three.  At  the  same  time,  researchers  have  considered  the  distributed  meeting  negotiation  task, 
which  we  do  not  address  in  this  report,  from  both  theoretical  ([4,  38])  and  system  ([39,  8,  3]) 
angles. 

A  system  we  will  call  Tulsa  [13]  implements  earlier  distributed  constraint-based  negotiation 
algorithms  [39]  with  an  emphasis  on  making  the  meeting  scheduling  process  more  efficient. 
While  Tulsa  accounts  for  a  wide  variety  of  user  preferences,  the  preferences  are  not  learned. 
Moreover,  it  remains  to  be  proved  that  users  are  willing  to  view  and  manipulate  a  screen  of 
scrollbars  to  provide  the  parameters  for  Tulsa’s  weighted  sum  preference  model. 

The  value  of  applying  machine  learning  to  user  scheduling  preferences  was  recognized  by 
Kozierok  and  Maes  [20],  presenting  the  Learning  Interface  agent  (EIA).  The  system  learns  rules 
for  how  to  respond  to  scheduling  situations  (e.g.,  accepting  a  meeting  request  or  not),  and  pre¬ 
ferred  meeting  times.  The  preference  model  for  the  latter  is  based  on  user  day/time  preferences, 
and  user  assessment  of  the  importance  of  the  preferences  of  other  users.  The  suggested  times  are 
found  not  by  constraint  reasoning  but  by  finding  the  timeslot  with  the  maximum  weighted  pref¬ 
erence  score;  non-temporal  relaxations  are  not  considered.  In  further  contrast  to  our  work,  EIA 
requires  explicit  user  feedback  to  correct  mistakes.  Its  designers  [20]  emphasize  the  centrality 
of  user  confidence  in  an  assistive  agent,  and  thus  the  centrality  of  both  adjustable  autonomy  and 
explanation,  e.g.,  giving  confidence  levels  for  agent  predictions. 

Calendar  Apprentice  (CAP)  [25]  takes  the  antipodal  approach.  CAP  offers  no  automated 
reasoning  but  provides  scheduling  assistance  in  the  form  of  learned  recommended  values.  The 
meeting  request  is  elicited  by  a  sequence  of  prompts  (equivalent  to  a  fixed-order  filling  in  of  a 
form);  CAP  offers  a  suggested  value  at  each  step  and  acquires  training  examples  when  the  user 
overrides  these  values.  Thus,  the  learning  is  unobtrusive,  like  PTIME,  but  in  contrast  to  our 
work  the  learning  is  performed  offline,  and  rules  rather  than  a  preference  model  are  learned. 

The  same  preference  model  necessary  for  meeting  scheduling  is  of  use  in  the  wider  person¬ 
alized  time  management  context.  Augur  [40],  for  instance,  is  a  system  designed  to  blend  the 
flexibility  of  an  individual’s  calendaring  process  while  opening  up  interpersonal  communication 
via  shared  calendars.  Augur  learns  models  of  user  event  attendance,  and  augments  a  calendar 
display  with  iconic,  colour,  and  transparency  visualizations. 

A  similar  group  facilitation  approach  is  found  in  groupTime  [3].  groupTime  models  user 
temporal  preferences  and,  as  Augur,  learns  models  of  user  attendance.  In  addition,  groupTime 
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offers  suggested  times  for  a  meeting  request,  based  on  the  partieipants’  preferenees  and  predieted 
attendanee.  The  meeting  request  time  is  elieited  by  painting  preferenees  onto  a  ealendar.  In 
eontrast  to  PTIME,  groupTime  offers  a  web-based  multi-phase  negotiation  proeess  among  the 
meeting  invitees,  in  a  university  setting  where  there  is  equality  among  the  partieipants  (i.e., 
no  designated  host  arranging  the  meeting).  groupTime’s  reasoning  is  limited  to  visualizing 
the  aggregated  preferenee  of  eaeh  timeslot;  suggested  sehedule  options  ereated  by  relaxing  the 
request  are  not  provided. 

Like  LIA,  CAL,  and  groupTime,  RhaiCAL  [5]  (a  deseendant  of  the  RCAL  seheduling  agent 
[34])  presents  alternate  seheduling  options  on  a  ealendar  view.  Like  PTIME,  RhaiCAL’s  mode 
of  operation  is  to  support  a  single  meeting  organizer,  rather  than  the  many-equal-partieipants  use 
ease  of  groupTime;  like  PTIME  it  is  intended  for  the  difheult  ease  of  over-eonstrained,  multiple- 
person  meetings.  RhaiCAL’s  display  of  sehedules  is  designed  to  handle  over-eonstrained  situa¬ 
tions  by  indieating  to  the  meeting  host  eaeh  partieipant’s  preferenees.  While  relaxations  of  the 
meeting  request  are  eomputed,  these  prineipally  revolve  around  temporal  eonstraints,  in  eon¬ 
trast  with  our  sophistieated  trade-offs.  RhaiCAL  employs  availability  bars  [6],  a  interaetion  and 
visualization  tool  for  temporal  preferenees.  Sueh  direet  manipulation  interfaee  tools  eould  eom- 
plement  our  NL-based  elieitation  of  meeting  request  eonstraints  and  preferenee.  They  eould  also 
inform  the  user  as  she  seleets  among  eandidate  sehedules;  our  eurrent  visualization  of  seheduling 
options  is  based  on  similar  prineiples  to  availability  bars,  but  is  invoked  via  an  “Explain”  option, 
rather  than  being  integrated  with  a  simultaneous  ealendar  view  of  the  eandidate  sehedules. 

A  sibling  of  RhaiCAL,  CMRadar  [26]  is  the  work  most  similar  to  the  approaeh  taken  by 
PTIME.  A  CMRadar  agent  helps  its  user  sehedule  meetings  by  parsing  out  meeting  informa¬ 
tion  from  email.  It  generates  schedule  options  using  a  constraint-based  scheduler  that  takes 
into  account  user  preferences  and  presents  candidate  schedules  with  a  graphical  visualization. 
CMRadar  has  a  simple  natural  language  parser  based  on  templates  that  are  used  to  interpret 
meeting-related  emails.  This  mirrors  our  NL  input  mechanism,  but  is  less  flexible  and  on  a 
smaller  scale.  The  CMRadar  preference  model  is  a  weighted  sum  over  scheduling  features, 
which  is  a  restricted  case  of  our  Choquet  form.  CMRadar  uses  a  passive  learning  approach  to 
learn  user  preferences  [30],  which  contrasts  with  PTIME’s  online,  active  learning. 
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8  CONCLUSION 

Personalization  is  a  key  requirement  for  adoption  of  automated  scheduling  technology.  A  model 
of  the  user’s  preferences,  on  one  hand,  must  be  expressive  enough  to  capture  the  salient  features 
that  distinguish  one  scheduling  option  from  another  in  the  user’s  judgment.  On  the  other  hand, 
the  model  must  be  amenable  to  elicitation  to  populate  instances  of  it,  to  explanation,  and  to 
reasoning  over  scheduling  requests  to  derive  candidate  schedules.  Moreover,  if  the  system  is  to 
adapt  itself,  the  model  must  further  be  amenable  to  the  machine  learning  techniques  employed. 

Based  on  an  initial  user  study,  we  devised  a  preference  model  designed  to  balance  these  needs. 
We  have  implemented  this  model,  which  is  based  on  a  Choquet  integral  form,  in  a  deployed 
scheduling  assistant  agent.  The  implementation  necessitated  a  user  interface  design  that  exposes 
the  richness  of  the  model  without  overbearing  the  user,  novel  constraint-based  reasoning  to 
find  optimal  schedules  according  to  the  model,  and  adaption  of  earlier  work  in  non-obtrusive 
online  learning  to  update  the  model  as  the  user  employs  the  system.  A  distinguishing  feature  of 
the  model  developed  is  the  ability  to  express  sophisticated  trade-off  between  schedule  features; 
capturing  such  trade-offs  is  crucial  if  a  system  is  to  offer  the  user  desirable  relaxations  of  over¬ 
constrained  meeting  requests,  the  very  situation  where  scheduling  assistance  is  potentially  the 
most  valuable. 

The  inconclusive  results  of  our  experiments  point  to  the  difficulty  of  evaluating  a  model  and 
learning  process.  Positively,  the  user  interviews  indicate  that  our  model  is  expressive  enough 
to  capture  user  scheduling  preferences,  to  a  degree  sufficient  for  the  types  of  meeting  requests 
made  by  knowledge  workers  in  a  typical  office  setting.  Although  we  find  thaf  learning  obfains  an 
inexacf  representation  of  the  schedule  ranking  function  (as  measured  by  Spearman’s  p),  the  re¬ 
fined  preference  model  obfained  by  fhe  learning  becomes  adepf  af  suggesting  fhe  mosf  preferred 
schedules. 

Users  found  that  the  implemented  system  provides  reasonable  scheduling  options  from  its  first 
use  and  exhibits  increasing  trustworthiness  over  time  —  complementary  aspects  found  essential 
if  scheduling  technology  is  to  be  adopted  in  practice  [25].  Integration  with  the  user’s  existing 
calendaring  systems  and  workflow  combine  with  the  low  demand  of  the  preference  elicitation 
and  learning,  to  further  support  adoption  of  the  system  [32,  2].  An  extensive  formal  study  with 
real  users  in  a  deployed  setting  indicates  user  satisfaction  with  the  overall  system. 

These  positive  points  said,  the  experiments  were  not  able  to  evaluate  the  success  of  the  our 
paradigm  of  lightweight  elicitation  of  an  initial  model,  and  its  subsequent  non-intrusive  refine¬ 
ment.  While  users  found  the  system’s  behaviour  satisfactory,  the  evaluation  with  human  subjects 
did  not  demonstrate  that  an  elicited  instantiation  of  the  preference  model  is  superior  to  a  random 
instantiation,  nor  that  online  learning  converges  to  the  user’s  true  model  in  practical  settings  (al¬ 
though  in  artificial  settings  it  does).  In  other  words,  the  experiments  were  unable  to  determine 
the  correlation  or  otherwise  between  the  seen  satisfactory  performance  of  the  system  and  the 
model,  elicitation,  and  learning  approach  underlying  it. 

The  reasons  for  this  inability,  as  noted  earlier,  include  the  small  number  of  subjects,  a  lack 
of  data  for  learning  to  be  visible,  the  difference  between  “interesting”  and  “uninteresting”  prob¬ 
lem  instances,  and  the  difficulty  of  uncovering  the  user’s  true  preferences  or  capturing  them  by 
approximate  metrics. 

A  central  aspect,  therefore,  of  our  ongoing  work  is  to  design  experiments  that  are  able  to 
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answer  the  above  questions.  Toward  this  goal,  we  have  begun  designing  a  test  harness  that  will 
allow  us  to  quiekly  gather  new  data  and  experiment  with  various  preferenee  models  and  learning 
algorithms.  The  harness  will  enable  more  realistie  validation  before  deployment  and  will  help 
eharaeterize  how  well  different  model  and  algorithm  eombinations  work  for  different  types  of 
seheduling  instanees. 

In  ongoing  work  we  are  eonsidering  the  use  of  multiple  eomplementary  preferenee  models. 
For  instanee,  deploying  a  simple  1 -order  Choquet  model  that  is  more  readily  elieited  and  refined, 
for  under-eonstrained  problem  instanees,  in  eonjunetion  with  the  existing  2-order  model  that 
is  able  to  eapture  eriteria  trade-offs,  for  eritieally-  and  over-eonstrained  instanees  where  the 
interaetion  between  eriteria  is  a  mueh  more  signifieant  faetor. 

On  the  offline  learning  side,  we  are  investigating  the  merits  of  more  direet,  albeit  heavyweight, 
elieitation  of  the  Choquet  eomplementarity  and  substitutability  degrees  between  eriteria,  via 
methods  from  deeision  theory.  On  the  online  learning  side,  we  are  investigating  hierarehieal 
learning  over  the  eriteria  within  the  Choquet  Ui  funetions  as  well  as  over  the  aggregated  F 
funetion.  We  are  also  exploring  adaptive  presentation  of  the  eandidate  set  of  sehedule  options, 
ineluding  learning  what  options  to  present  (e.g.,  varying  the  diversity  of  the  set  aeeording  to 
measures  of  its  informational  entropy  [42]). 

Other  avenues  of  interest  inelude  learning  the  importanee  of  a  meeting  to  an  individual  and  of 
an  individual  to  a  meeting  (although  not  deseribed  in  this  report,  PTIME  has  an  initial  eapability 
in  both  areas),  predieting  the  likelihood  of  a  partieipant’s  attendanee  at  a  meeting  (as  for  instanee 
done  by  [40]),  faeilitating  negotiation  with  other  users  to  refine  or  modify  the  organizer’s  ehosen 
meeting  option,  exploring  visualization  of  others’  preferenees  and  predieted  attendanee  ([3,  6]), 
and  transfer  learning  of  typieal  preferenee  models  between  users  ([3]).  Last,  we  are  working  on 
explaining  the  system’s  eonelusions  based  on  the  Choquet  model,  in  the  eontext  of  expanding 
our  work  from  meeting  seheduling  to  wider  ealendar  management  and  negotiation  proeesses. 
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